One of the key activities of any IT function is to “Keep the lights on” to ensure there is noimpact to the Business operations. IT leverages Incident Management process to achieve theabove Objective. An incident is something that is unplanned interruption to an IT service orreduction in the quality of an IT service that affects the Users and the Business. The main goalof Incident Management process is to provide a quick fix / workarounds or solutions thatresolves the interruption and restores the service to its full capacity to ensure no businessimpact.In most of the organizations, incidents are created by various Business and IT Users, End Users/ Vendors if they have access to ticketing systems, and from the integrated monitoringsystems and tools. Assigning the incidents to the appropriate person or unit in the support team has critical importance to provide improved user satisfaction while ensuring better allocation of support resources. The assignment of incidents to appropriate IT groups is still a manual process in many of the IT organizations.Manual assignment of incidents is time consuming and requires humanefforts. There may bemistakes due to human errors and resource consumption is carried out ineffectively because ofthe misaddressing. On the other hand, manual assignment increases the response and resolution times which result in user satisfaction deterioration / poor customer service.
In the support process, incoming incidents are analyzed and assessed by organization’s support teams to fulfill the request. In many organizations, better allocation and effective usage of the valuable support resources will directly result in substantial cost savings.Currently the incidents are created by various stakeholders (Business Users, IT Users and Monitoring Tools) within IT Service Management Tool and are assigned to Service Desk teams (L1/ L2 teams). This team will review the incidents for right ticket categorization, priorities and then carry out initial diagnosis to see if they can resolve. Around ~54% of the incidents are resolved by L1 / L2 teams. Incase L1 / L2 is unable to resolve, they will then escalate / assign the tickets to Functional teams from Applications and Infrastructure (L3 teams). Some portions of incidents are directly assigned to L3 teams by either Monitoring tools or Callers / Requestors. L3 teams will carry out detailed diagnosis and resolve the incidents. Around ~56% of incidents are resolved by Functional / L3 teams. Incase if vendor support is needed, they will reach out for their support towards incident closure.L1 / L2 needs to spend time reviewing Standard Operating Procedures (SOPs) before assigning to Functional teams (Minimum ~25-30% of incidents needs to be reviewed for SOPs before ticket assignment). 15 min is being spent for SOP review for each incident. Minimum of ~1 FTE effort needed only for incident assignment to L3 teams.
During the process of incident assignments by L1 / L2 teams to functional groups, there were multiple instances of incidents getting assigned to wrong functional groups. Around ~25% of Incidents are wrongly assigned to functional teams. Additional effort needed for Functional teams to re-assign to right functional groups. During this process, some of the incidents are in queue and not addressed timely resulting in poor customer service.Guided by powerful AI techniques that can classify incidents to right functional groups can help organizations to reduce the resolving time of the issue and can focus on more productive tasks.
To build a ticket classifier.
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import scipy
import scipy.stats as st
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
import os
!ls
drive sample_data
os.chdir('drive/My Drive/Colab Notebooks')
!ls
17flowers automate-ticket-gl-grp11.ipynb ComputerVision-P8-Shared.ipynb 'Copy of p08-ComputerVision1-contd.ipynb' 'Data - Sarcasm Detection.' FlowerModel 'Flowers - Classification' 'IMDB Dataset.csv' input_data.xlsx log p08-ComputerVision1-contd.ipynb p09 p09-ComputerVisionProj2.ipynb p10 p10-NLP.ipynb p10-NLP-P2.ipynb p3-cars.csv p3-cars.gsheet 'Part-1 - Plant Seedling Classification Data' Part3-Cars Part4-Flower sarcasm_detector.h5
pip install wordcloud
Requirement already satisfied: wordcloud in /usr/local/lib/python3.7/dist-packages (1.5.0) Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from wordcloud) (7.1.2) Requirement already satisfied: numpy>=1.6.1 in /usr/local/lib/python3.7/dist-packages (from wordcloud) (1.19.5)
pip install fasttext==0.9.2
Collecting fasttext==0.9.2
Downloading fasttext-0.9.2.tar.gz (68 kB)
|████████████████████████████████| 68 kB 3.0 MB/s
Collecting pybind11>=2.2
Using cached pybind11-2.7.1-py2.py3-none-any.whl (200 kB)
Requirement already satisfied: setuptools>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from fasttext==0.9.2) (57.4.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from fasttext==0.9.2) (1.19.5)
Building wheels for collected packages: fasttext
Building wheel for fasttext (setup.py) ... done
Created wheel for fasttext: filename=fasttext-0.9.2-cp37-cp37m-linux_x86_64.whl size=3090454 sha256=6fdaab2f2b94e883a1dfac61dcb33bb9dbe37e139d888d2ed3a640fd9f9eaafb
Stored in directory: /root/.cache/pip/wheels/4e/ca/bf/b020d2be95f7641801a6597a29c8f4f19e38f9c02a345bab9b
Successfully built fasttext
Installing collected packages: pybind11, fasttext
Successfully installed fasttext-0.9.2 pybind11-2.7.1
tickets = pd.read_excel('input_data.xlsx')
tickets.head()
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| 0 | login issue | -verified user details.(employee# & manager na... | spxjnwir pjlcoqds | GRP_0 |
| 1 | outlook | \r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail... | hmjdrvpb komuaywn | GRP_0 |
| 2 | cant log in to vpn | \r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail... | eylqgodm ybqkwiam | GRP_0 |
| 3 | unable to access hr_tool page | unable to access hr_tool page | xbkucsvz gcpydteq | GRP_0 |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 |
tickets.head(20)
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| 0 | login issue | -verified user details.(employee# & manager na... | spxjnwir pjlcoqds | GRP_0 |
| 1 | outlook | \r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail... | hmjdrvpb komuaywn | GRP_0 |
| 2 | cant log in to vpn | \r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail... | eylqgodm ybqkwiam | GRP_0 |
| 3 | unable to access hr_tool page | unable to access hr_tool page | xbkucsvz gcpydteq | GRP_0 |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 |
| 5 | unable to log in to engineering tool and skype | unable to log in to engineering tool and skype | eflahbxn ltdgrvkz | GRP_0 |
| 6 | event: critical:HostName_221.company.com the v... | event: critical:HostName_221.company.com the v... | jyoqwxhz clhxsoqy | GRP_1 |
| 7 | ticket_no1550391- employment status - new non-... | ticket_no1550391- employment status - new non-... | eqzibjhw ymebpoih | GRP_0 |
| 8 | unable to disable add ins on outlook | unable to disable add ins on outlook | mdbegvct dbvichlg | GRP_0 |
| 9 | ticket update on inplant_874773 | ticket update on inplant_874773 | fumkcsji sarmtlhy | GRP_0 |
| 10 | engineering tool says not connected and unable... | engineering tool says not connected and unable... | badgknqs xwelumfz | GRP_0 |
| 11 | hr_tool site not loading page correctly | hr_tool site not loading page correctly | dcqsolkx kmsijcuz | GRP_0 |
| 12 | unable to login to hr_tool to sgxqsuojr xwbeso... | unable to login to hr_tool to sgxqsuojr xwbeso... | oblekmrw qltgvspb | GRP_0 |
| 13 | user wants to reset the password | user wants to reset the password | iftldbmu fujslwby | GRP_0 |
| 14 | unable to open payslips | unable to open payslips | epwyvjsz najukwho | GRP_0 |
| 15 | ticket update on inplant_874743 | ticket update on inplant_874743 | fumkcsji sarmtlhy | GRP_0 |
| 16 | unable to login to company vpn | \n\nreceived from: xyz@company.com\n\nhi,\n\ni... | chobktqj qdamxfuc | GRP_0 |
| 17 | when undocking pc , screen will not come back | when undocking pc , screen will not come back | sigfdwcj reofwzlm | GRP_3 |
| 18 | erp SID_34 account locked | erp SID_34 account locked | nqdyowsm yqerwtna | GRP_0 |
| 19 | unable to sign into vpn | unable to sign into vpn | ftsqkvre bqzrupic | GRP_0 |
tickets.describe()
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| count | 8492 | 8499 | 8500 | 8500 |
| unique | 7481 | 7817 | 2950 | 74 |
| top | password reset | the | bpctwhsn kzqsbmtp | GRP_0 |
| freq | 38 | 56 | 810 | 3976 |
tickets.isna().sum()
Short description 8 Description 1 Caller 0 Assignment group 0 dtype: int64
tickets.dropna(inplace = True)
tickets.isna().sum()
Short description 0 Description 0 Caller 0 Assignment group 0 dtype: int64
tickets['Caller'].unique()
array(['spxjnwir pjlcoqds', 'hmjdrvpb komuaywn', 'eylqgodm ybqkwiam', ...,
'bjitvswa yrmugfnq', 'oybwdsgx oxyhwrfz', 'kqvbrspl jyzoklfx'],
dtype=object)
tickets['Caller'].value_counts()
bpctwhsn kzqsbmtp 810
ZkBogxib QsEJzdZO 151
fumkcsji sarmtlhy 134
rbozivdq gmlhrtvp 87
rkupnshb gsmzfojw 71
...
dubpgacz kjzhilng 1
rzxfgmcu xprwayoc 1
hctajofe qgrkcxyt 1
ogfjbrlw nwakldmx 1
uojdnrvs amchenrg 1
Name: Caller, Length: 2948, dtype: int64
# Content in caller seems to be gibberish and does not make sense at all. We potentially can drop this column.
tickets['Assignment group'].value_counts()
GRP_0 3968
GRP_8 661
GRP_24 289
GRP_12 257
GRP_9 252
...
GRP_73 1
GRP_64 1
GRP_35 1
GRP_61 1
GRP_67 1
Name: Assignment group, Length: 74, dtype: int64
tickets.size
33964
groups_freq = pd.DataFrame.from_dict(dict(tickets['Assignment group'].value_counts()), orient='index', columns=['frequency'])
groups_freq.head(20)
| frequency | |
|---|---|
| GRP_0 | 3968 |
| GRP_8 | 661 |
| GRP_24 | 289 |
| GRP_12 | 257 |
| GRP_9 | 252 |
| GRP_2 | 241 |
| GRP_19 | 215 |
| GRP_3 | 200 |
| GRP_6 | 184 |
| GRP_13 | 145 |
| GRP_10 | 140 |
| GRP_5 | 129 |
| GRP_14 | 118 |
| GRP_25 | 116 |
| GRP_33 | 107 |
| GRP_4 | 100 |
| GRP_29 | 97 |
| GRP_18 | 88 |
| GRP_16 | 85 |
| GRP_17 | 81 |
# Beyod 20 groups remaining 54 groups contain less than 80 tickets per group. This data set is heavily skewed on certain groups.
groups_freq.tail(30)
| frequency | |
|---|---|
| GRP_44 | 15 |
| GRP_36 | 15 |
| GRP_50 | 14 |
| GRP_65 | 11 |
| GRP_53 | 11 |
| GRP_52 | 9 |
| GRP_55 | 8 |
| GRP_51 | 8 |
| GRP_49 | 6 |
| GRP_46 | 6 |
| GRP_59 | 6 |
| GRP_43 | 5 |
| GRP_32 | 4 |
| GRP_66 | 4 |
| GRP_56 | 3 |
| GRP_63 | 3 |
| GRP_68 | 3 |
| GRP_38 | 3 |
| GRP_58 | 3 |
| GRP_69 | 2 |
| GRP_57 | 2 |
| GRP_71 | 2 |
| GRP_72 | 2 |
| GRP_54 | 2 |
| GRP_70 | 1 |
| GRP_73 | 1 |
| GRP_64 | 1 |
| GRP_35 | 1 |
| GRP_61 | 1 |
| GRP_67 | 1 |
# bottom 30 groups barely have any significant tickets to affect any learning here.
tickets
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| 0 | login issue | -verified user details.(employee# & manager na... | spxjnwir pjlcoqds | GRP_0 |
| 1 | outlook | \r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail... | hmjdrvpb komuaywn | GRP_0 |
| 2 | cant log in to vpn | \r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail... | eylqgodm ybqkwiam | GRP_0 |
| 3 | unable to access hr_tool page | unable to access hr_tool page | xbkucsvz gcpydteq | GRP_0 |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 |
| ... | ... | ... | ... | ... |
| 8495 | emails not coming in from zz mail | \r\n\r\nreceived from: avglmrts.vhqmtiua@gmail... | avglmrts vhqmtiua | GRP_29 |
| 8496 | telephony_software issue | telephony_software issue | rbozivdq gmlhrtvp | GRP_0 |
| 8497 | vip2: windows password reset for tifpdchb pedx... | vip2: windows password reset for tifpdchb pedx... | oybwdsgx oxyhwrfz | GRP_0 |
| 8498 | machine não está funcionando | i am unable to access the machine utilities to... | ufawcgob aowhxjky | GRP_62 |
| 8499 | an mehreren pc`s lassen sich verschiedene prgr... | an mehreren pc`s lassen sich verschiedene prgr... | kqvbrspl jyzoklfx | GRP_49 |
8491 rows × 4 columns
groups_freq['percentage'] = (groups_freq['frequency'] / groups_freq['frequency'].sum())*100
groups_freq.head(30)
| frequency | percentage | |
|---|---|---|
| GRP_0 | 3968 | 46.731834 |
| GRP_8 | 661 | 7.784713 |
| GRP_24 | 289 | 3.403604 |
| GRP_12 | 257 | 3.026734 |
| GRP_9 | 252 | 2.967848 |
| GRP_2 | 241 | 2.838299 |
| GRP_19 | 215 | 2.532093 |
| GRP_3 | 200 | 2.355435 |
| GRP_6 | 184 | 2.167000 |
| GRP_13 | 145 | 1.707690 |
| GRP_10 | 140 | 1.648805 |
| GRP_5 | 129 | 1.519256 |
| GRP_14 | 118 | 1.389707 |
| GRP_25 | 116 | 1.366152 |
| GRP_33 | 107 | 1.260158 |
| GRP_4 | 100 | 1.177718 |
| GRP_29 | 97 | 1.142386 |
| GRP_18 | 88 | 1.036391 |
| GRP_16 | 85 | 1.001060 |
| GRP_17 | 81 | 0.953951 |
| GRP_31 | 69 | 0.812625 |
| GRP_7 | 68 | 0.800848 |
| GRP_34 | 61 | 0.718408 |
| GRP_26 | 56 | 0.659522 |
| GRP_40 | 45 | 0.529973 |
| GRP_28 | 44 | 0.518196 |
| GRP_41 | 40 | 0.471087 |
| GRP_15 | 39 | 0.459310 |
| GRP_30 | 39 | 0.459310 |
| GRP_42 | 37 | 0.435756 |
groups_freq['cum percent'] = groups_freq['percentage'].cumsum(axis = 0)
groups_freq.head(45)
| frequency | percentage | cum percent | |
|---|---|---|---|
| GRP_0 | 3968 | 46.731834 | 46.731834 |
| GRP_8 | 661 | 7.784713 | 54.516547 |
| GRP_24 | 289 | 3.403604 | 57.920151 |
| GRP_12 | 257 | 3.026734 | 60.946885 |
| GRP_9 | 252 | 2.967848 | 63.914733 |
| GRP_2 | 241 | 2.838299 | 66.753033 |
| GRP_19 | 215 | 2.532093 | 69.285125 |
| GRP_3 | 200 | 2.355435 | 71.640561 |
| GRP_6 | 184 | 2.167000 | 73.807561 |
| GRP_13 | 145 | 1.707690 | 75.515251 |
| GRP_10 | 140 | 1.648805 | 77.164056 |
| GRP_5 | 129 | 1.519256 | 78.683312 |
| GRP_14 | 118 | 1.389707 | 80.073018 |
| GRP_25 | 116 | 1.366152 | 81.439171 |
| GRP_33 | 107 | 1.260158 | 82.699329 |
| GRP_4 | 100 | 1.177718 | 83.877046 |
| GRP_29 | 97 | 1.142386 | 85.019432 |
| GRP_18 | 88 | 1.036391 | 86.055824 |
| GRP_16 | 85 | 1.001060 | 87.056884 |
| GRP_17 | 81 | 0.953951 | 88.010835 |
| GRP_31 | 69 | 0.812625 | 88.823460 |
| GRP_7 | 68 | 0.800848 | 89.624308 |
| GRP_34 | 61 | 0.718408 | 90.342716 |
| GRP_26 | 56 | 0.659522 | 91.002238 |
| GRP_40 | 45 | 0.529973 | 91.532211 |
| GRP_28 | 44 | 0.518196 | 92.050406 |
| GRP_41 | 40 | 0.471087 | 92.521493 |
| GRP_15 | 39 | 0.459310 | 92.980803 |
| GRP_30 | 39 | 0.459310 | 93.440113 |
| GRP_42 | 37 | 0.435756 | 93.875869 |
| GRP_20 | 36 | 0.423978 | 94.299847 |
| GRP_45 | 35 | 0.412201 | 94.712048 |
| GRP_22 | 31 | 0.365092 | 95.077141 |
| GRP_1 | 31 | 0.365092 | 95.442233 |
| GRP_11 | 30 | 0.353315 | 95.795548 |
| GRP_21 | 29 | 0.341538 | 96.137086 |
| GRP_47 | 27 | 0.317984 | 96.455070 |
| GRP_23 | 25 | 0.294429 | 96.749499 |
| GRP_62 | 25 | 0.294429 | 97.043929 |
| GRP_48 | 25 | 0.294429 | 97.338358 |
| GRP_60 | 20 | 0.235544 | 97.573902 |
| GRP_39 | 19 | 0.223766 | 97.797668 |
| GRP_27 | 18 | 0.211989 | 98.009657 |
| GRP_37 | 16 | 0.188435 | 98.198092 |
| GRP_44 | 15 | 0.176658 | 98.374750 |
# It seems 35 groups cover for ~96% of tickets. Rest 39 contribute to only 4% of the trips.
# If we ignore all entries below 35 frequency then we have a coverage of ~95% of the tickets.
groups_filtered = groups_freq[groups_freq['frequency'] >= 35]
len(groups_filtered)
32
grp_list = list(groups_filtered.index)
grp_list
['GRP_0', 'GRP_8', 'GRP_24', 'GRP_12', 'GRP_9', 'GRP_2', 'GRP_19', 'GRP_3', 'GRP_6', 'GRP_13', 'GRP_10', 'GRP_5', 'GRP_14', 'GRP_25', 'GRP_33', 'GRP_4', 'GRP_29', 'GRP_18', 'GRP_16', 'GRP_17', 'GRP_31', 'GRP_7', 'GRP_34', 'GRP_26', 'GRP_40', 'GRP_28', 'GRP_41', 'GRP_15', 'GRP_30', 'GRP_42', 'GRP_20', 'GRP_45']
tickets_filtered = tickets[tickets['Assignment group'].isin(grp_list)]
tickets_filtered.describe()
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| count | 8042 | 8042 | 8042 | 8042 |
| unique | 7067 | 7378 | 2864 | 32 |
| top | password reset | the | bpctwhsn kzqsbmtp | GRP_0 |
| freq | 38 | 56 | 784 | 3968 |
tickets_filtered['Assignment group'].value_counts()
GRP_0 3968 GRP_8 661 GRP_24 289 GRP_12 257 GRP_9 252 GRP_2 241 GRP_19 215 GRP_3 200 GRP_6 184 GRP_13 145 GRP_10 140 GRP_5 129 GRP_14 118 GRP_25 116 GRP_33 107 GRP_4 100 GRP_29 97 GRP_18 88 GRP_16 85 GRP_17 81 GRP_31 69 GRP_7 68 GRP_34 61 GRP_26 56 GRP_40 45 GRP_28 44 GRP_41 40 GRP_15 39 GRP_30 39 GRP_42 37 GRP_20 36 GRP_45 35 Name: Assignment group, dtype: int64
# Data cleaning functions
def remove_spaces_and_tabs(df, col_name):
df[col_name] = df[col_name].str.strip()
df[col_name] = df[col_name].replace(r"\\t|\\n|\\r", "\t|\n|\r", regex=True)
df[col_name] = df[col_name].replace('\s+', ' ', regex=True)
def remove_digits(df, col_name):
df[col_name] = df[col_name].replace('\d+', '', regex=True)
def lower_case(df, col_name):
df[col_name] = df[col_name].str.lower()
spec_chars = ["!",'"',"#","%","&","'","(",")",
"*","+",",","-",".","/",":",";","<",
"=",">","?","@","[","\\","]","^","_",
"`","{","|","}","~","–"]
def remove_special_chars(df, col_name):
for char in spec_chars:
df[col_name] = df[col_name].str.replace(char, ' ')
def clean_data(df, col_name):
remove_spaces_and_tabs(df, col_name)
remove_special_chars(df, col_name)
remove_spaces_and_tabs(df, col_name)
remove_digits(df, col_name)
lower_case(df, col_name)
tickets_filtered.head(10)
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| 0 | login issue | -verified user details.(employee# & manager na... | spxjnwir pjlcoqds | GRP_0 |
| 1 | outlook | received from: hmjdrvpb.komuaywn@gmail.com hel... | hmjdrvpb komuaywn | GRP_0 |
| 2 | cant log in to vpn | received from: eylqgodm.ybqkwiam@gmail.com hi ... | eylqgodm ybqkwiam | GRP_0 |
| 3 | unable to access hr_tool page | unable to access hr_tool page | xbkucsvz gcpydteq | GRP_0 |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 |
| 5 | unable to log in to engineering tool and skype | unable to log in to engineering tool and skype | eflahbxn ltdgrvkz | GRP_0 |
| 7 | ticket_no- employment status - new non-employe... | ticket_no- employment status - new non-employe... | eqzibjhw ymebpoih | GRP_0 |
| 8 | unable to disable add ins on outlook | unable to disable add ins on outlook | mdbegvct dbvichlg | GRP_0 |
| 9 | ticket update on inplant_ | ticket update on inplant_ | fumkcsji sarmtlhy | GRP_0 |
| 10 | engineering tool says not connected and unable... | engineering tool says not connected and unable... | badgknqs xwelumfz | GRP_0 |
tickets_filtered['Caller'].value_counts()
bpctwhsn kzqsbmtp 784
ZkBogxib QsEJzdZO 142
fumkcsji sarmtlhy 133
rbozivdq gmlhrtvp 87
rkupnshb gsmzfojw 66
...
rfvmeyho qgtxjsdc 1
hiysxgwf vqdbtexf 1
sdlixwmb zvygmnco 1
uyjglskh lhurepnw 1
oybwdsgx oxyhwrfz 1
Name: Caller, Length: 2864, dtype: int64
clean_data(tickets_filtered, 'Description')
clean_data(tickets_filtered, 'Short description')
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy after removing the cwd from sys.path. /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """ /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:20: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy if __name__ == '__main__': /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy if sys.path[0] == '':
tickets_filtered
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| 0 | login issue | verified user details employee manager name ch... | spxjnwir pjlcoqds | GRP_0 |
| 1 | outlook | received from hmjdrvpb komuaywn gmail com hell... | hmjdrvpb komuaywn | GRP_0 |
| 2 | cant log in to vpn | received from eylqgodm ybqkwiam gmail com hi i... | eylqgodm ybqkwiam | GRP_0 |
| 3 | unable to access hr tool page | unable to access hr tool page | xbkucsvz gcpydteq | GRP_0 |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 |
| ... | ... | ... | ... | ... |
| 8493 | erp fi ob two accounts to be added | i am sorry i have another two accounts that ne... | ipwjorsc uboapexr | GRP_10 |
| 8494 | tablet needs reimaged due to multiple issues w... | tablet needs reimaged due to multiple issues w... | cpmaidhj elbaqmtp | GRP_3 |
| 8495 | emails not coming in from zz mail | received from avglmrts vhqmtiua gmail com good... | avglmrts vhqmtiua | GRP_29 |
| 8496 | telephony software issue | telephony software issue | rbozivdq gmlhrtvp | GRP_0 |
| 8497 | vip windows password reset for tifpdchb pedxruyf | vip windows password reset for tifpdchb pedxruyf | oybwdsgx oxyhwrfz | GRP_0 |
8042 rows × 4 columns
tickets_filtered
| Short description | Description | Caller | Assignment group | |
|---|---|---|---|---|
| 0 | login issue | verified user details employee manager name ch... | spxjnwir pjlcoqds | GRP_0 |
| 1 | outlook | received from hmjdrvpb komuaywn gmail com hell... | hmjdrvpb komuaywn | GRP_0 |
| 2 | cant log in to vpn | received from eylqgodm ybqkwiam gmail com hi i... | eylqgodm ybqkwiam | GRP_0 |
| 3 | unable to access hr tool page | unable to access hr tool page | xbkucsvz gcpydteq | GRP_0 |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 |
| ... | ... | ... | ... | ... |
| 8493 | erp fi ob two accounts to be added | i am sorry i have another two accounts that ne... | ipwjorsc uboapexr | GRP_10 |
| 8494 | tablet needs reimaged due to multiple issues w... | tablet needs reimaged due to multiple issues w... | cpmaidhj elbaqmtp | GRP_3 |
| 8495 | emails not coming in from zz mail | received from avglmrts vhqmtiua gmail com good... | avglmrts vhqmtiua | GRP_29 |
| 8496 | telephony software issue | telephony software issue | rbozivdq gmlhrtvp | GRP_0 |
| 8497 | vip windows password reset for tifpdchb pedxruyf | vip windows password reset for tifpdchb pedxruyf | oybwdsgx oxyhwrfz | GRP_0 |
8042 rows × 4 columns
tickets_filtered['eq'] = tickets_filtered.apply(lambda x: x['Short description'] == x['Description'], axis=1)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.
tickets_filtered['eq'].value_counts()
False 5138 True 2904 Name: eq, dtype: int64
tickets_filtered[tickets_filtered['eq']]
| Short description | Description | Caller | Assignment group | eq | |
|---|---|---|---|---|---|
| 3 | unable to access hr tool page | unable to access hr tool page | xbkucsvz gcpydteq | GRP_0 | True |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 | True |
| 5 | unable to log in to engineering tool and skype | unable to log in to engineering tool and skype | eflahbxn ltdgrvkz | GRP_0 | True |
| 7 | ticket no employment status new non employee e... | ticket no employment status new non employee e... | eqzibjhw ymebpoih | GRP_0 | True |
| 8 | unable to disable add ins on outlook | unable to disable add ins on outlook | mdbegvct dbvichlg | GRP_0 | True |
| ... | ... | ... | ... | ... | ... |
| 8489 | account locked | account locked | sdvlxbfe ptnahjkw | GRP_0 | True |
| 8492 | hr tool etime option not visitble | hr tool etime option not visitble | tmopbken ibzougsd | GRP_0 | True |
| 8494 | tablet needs reimaged due to multiple issues w... | tablet needs reimaged due to multiple issues w... | cpmaidhj elbaqmtp | GRP_3 | True |
| 8496 | telephony software issue | telephony software issue | rbozivdq gmlhrtvp | GRP_0 | True |
| 8497 | vip windows password reset for tifpdchb pedxruyf | vip windows password reset for tifpdchb pedxruyf | oybwdsgx oxyhwrfz | GRP_0 | True |
2904 rows × 5 columns
tickets_filtered['substr'] = tickets_filtered.apply(lambda x: x['Short description'] in x['Description'], axis=1)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.
tickets_filtered[tickets_filtered['substr']]
| Short description | Description | Caller | Assignment group | eq | substr | |
|---|---|---|---|---|---|---|
| 0 | login issue | verified user details employee manager name ch... | spxjnwir pjlcoqds | GRP_0 | False | True |
| 1 | outlook | received from hmjdrvpb komuaywn gmail com hell... | hmjdrvpb komuaywn | GRP_0 | False | True |
| 3 | unable to access hr tool page | unable to access hr tool page | xbkucsvz gcpydteq | GRP_0 | True | True |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 | True | True |
| 5 | unable to log in to engineering tool and skype | unable to log in to engineering tool and skype | eflahbxn ltdgrvkz | GRP_0 | True | True |
| ... | ... | ... | ... | ... | ... | ... |
| 8489 | account locked | account locked | sdvlxbfe ptnahjkw | GRP_0 | True | True |
| 8492 | hr tool etime option not visitble | hr tool etime option not visitble | tmopbken ibzougsd | GRP_0 | True | True |
| 8494 | tablet needs reimaged due to multiple issues w... | tablet needs reimaged due to multiple issues w... | cpmaidhj elbaqmtp | GRP_3 | True | True |
| 8496 | telephony software issue | telephony software issue | rbozivdq gmlhrtvp | GRP_0 | True | True |
| 8497 | vip windows password reset for tifpdchb pedxruyf | vip windows password reset for tifpdchb pedxruyf | oybwdsgx oxyhwrfz | GRP_0 | True | True |
4936 rows × 6 columns
tickets_filtered[tickets_filtered['substr']][tickets_filtered['eq'] != True]
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index. """Entry point for launching an IPython kernel.
| Short description | Description | Caller | Assignment group | eq | substr | |
|---|---|---|---|---|---|---|
| 0 | login issue | verified user details employee manager name ch... | spxjnwir pjlcoqds | GRP_0 | False | True |
| 1 | outlook | received from hmjdrvpb komuaywn gmail com hell... | hmjdrvpb komuaywn | GRP_0 | False | True |
| 16 | unable to login to company vpn | received from xyz company com hi i am unable t... | chobktqj qdamxfuc | GRP_0 | False | True |
| 31 | reset users | hi please reset users password client id usern... | qcehailo wqynckxg | GRP_0 | False | True |
| 47 | job job failed in job scheduler at | received from monitoring tool company com job ... | bpctwhsn kzqsbmtp | GRP_6 | False | True |
| ... | ... | ... | ... | ... | ... | ... |
| 8467 | hi it help team please unblock my new company ... | from ntydihzo aeptfbgs sent friday august am t... | ntydihzo aeptfbgs | GRP_0 | False | True |
| 8470 | please review your recent ticketing tool ticke... | from mikhghytr wafglhdrhjop sent thursday augu... | azxhejvq fyemlavd | GRP_16 | False | True |
| 8471 | 电脑开机开ä¸å‡ºæ¥ | to å°è´ºï¼œæ—©ä¸šç”µè„‘开机开ä¸å‡ºæ¥ | xqyjztnm onfusvlz | GRP_30 | False | True |
| 8483 | fw case id ref case ref others | from pacvbetl yptglhoe sent thursday august pm... | pacvbetl yptglhoe | GRP_0 | False | True |
| 8484 | please remove user hugcadrn ixhlwdgt ralfteimp... | please remove user hugcadrn ixhlwdgt ralfteimp... | hugcadrn ixhlwdgt | GRP_2 | False | True |
2032 rows × 6 columns
# Hence, there are 4936 rows where short description is substr and out of them 2032 are the ones where short and long descriptions are not equal
# Check if having equality has any effect on group distribution
tickets_filtered[tickets_filtered['eq'] != True]['Assignment group'].value_counts()
GRP_0 1862 GRP_8 606 GRP_9 245 GRP_2 210 GRP_12 198 GRP_6 181 GRP_19 172 GRP_13 137 GRP_10 132 GRP_5 128 GRP_3 104 GRP_25 97 GRP_29 93 GRP_4 90 GRP_18 84 GRP_14 83 GRP_33 80 GRP_16 77 GRP_17 77 GRP_7 60 GRP_34 45 GRP_24 44 GRP_26 44 GRP_40 44 GRP_41 39 GRP_15 37 GRP_31 34 GRP_45 33 GRP_20 29 GRP_42 28 GRP_30 23 GRP_28 22 Name: Assignment group, dtype: int64
tickets_filtered[tickets_filtered['eq']]['Assignment group'].value_counts()
GRP_0 2106 GRP_24 245 GRP_3 96 GRP_12 59 GRP_8 55 GRP_19 43 GRP_31 35 GRP_14 35 GRP_2 31 GRP_33 27 GRP_28 22 GRP_25 19 GRP_34 16 GRP_30 16 GRP_26 12 GRP_4 10 GRP_42 9 GRP_7 8 GRP_16 8 GRP_13 8 GRP_10 8 GRP_9 7 GRP_20 7 GRP_29 4 GRP_17 4 GRP_18 4 GRP_6 3 GRP_15 2 GRP_45 2 GRP_40 1 GRP_5 1 GRP_41 1 Name: Assignment group, dtype: int64
# Similarly check for substr feature
tickets_filtered[tickets_filtered['substr']]['Assignment group'].value_counts()
GRP_0 2659 GRP_8 487 GRP_24 263 GRP_9 209 GRP_6 132 GRP_12 130 GRP_3 124 GRP_5 115 GRP_19 88 GRP_10 82 GRP_2 82 GRP_14 63 GRP_31 46 GRP_33 44 GRP_4 37 GRP_13 37 GRP_25 37 GRP_18 35 GRP_28 29 GRP_29 29 GRP_7 27 GRP_16 26 GRP_26 25 GRP_30 24 GRP_34 21 GRP_42 18 GRP_40 15 GRP_45 15 GRP_20 14 GRP_41 11 GRP_15 8 GRP_17 4 Name: Assignment group, dtype: int64
tickets_filtered[tickets_filtered['substr'] != True]['Assignment group'].value_counts()
GRP_0 1309 GRP_8 174 GRP_2 159 GRP_19 127 GRP_12 127 GRP_13 108 GRP_25 79 GRP_17 77 GRP_3 76 GRP_29 68 GRP_33 63 GRP_4 63 GRP_16 59 GRP_10 58 GRP_14 55 GRP_18 53 GRP_6 52 GRP_9 43 GRP_7 41 GRP_34 40 GRP_26 31 GRP_15 31 GRP_40 30 GRP_41 29 GRP_24 26 GRP_31 23 GRP_20 22 GRP_45 20 GRP_42 19 GRP_30 15 GRP_28 15 GRP_5 14 Name: Assignment group, dtype: int64
# Doesn't look like any association as such for equality and substring feature. But what we can do is where strings are not equal or substr let's club the string content to
# make sure that both can be considered for feature gathering.
tickets_comb = tickets_filtered.copy()
# Club short and long description in long description itself.
tickets_comb.loc[tickets_comb['substr'] != True, 'Description'] = tickets_comb['Short description'] + " "+ tickets_comb['Description']
tickets_comb.head(10)
| Short description | Description | Caller | Assignment group | eq | substr | |
|---|---|---|---|---|---|---|
| 0 | login issue | verified user details employee manager name ch... | spxjnwir pjlcoqds | GRP_0 | False | True |
| 1 | outlook | received from hmjdrvpb komuaywn gmail com hell... | hmjdrvpb komuaywn | GRP_0 | False | True |
| 2 | cant log in to vpn | cant log in to vpn received from eylqgodm ybqk... | eylqgodm ybqkwiam | GRP_0 | False | False |
| 3 | unable to access hr tool page | unable to access hr tool page | xbkucsvz gcpydteq | GRP_0 | True | True |
| 4 | skype error | skype error | owlgqjme qhcozdfx | GRP_0 | True | True |
| 5 | unable to log in to engineering tool and skype | unable to log in to engineering tool and skype | eflahbxn ltdgrvkz | GRP_0 | True | True |
| 7 | ticket no employment status new non employee e... | ticket no employment status new non employee e... | eqzibjhw ymebpoih | GRP_0 | True | True |
| 8 | unable to disable add ins on outlook | unable to disable add ins on outlook | mdbegvct dbvichlg | GRP_0 | True | True |
| 9 | ticket update on inplant | ticket update on inplant | fumkcsji sarmtlhy | GRP_0 | True | True |
| 10 | engineering tool says not connected and unable... | engineering tool says not connected and unable... | badgknqs xwelumfz | GRP_0 | True | True |
Entry 2 shows merge happened when not equal and entry 4 represent did not happen when not substre
#Drop Short description, eq and substr
tickets_comb.drop(columns=['Short description', 'substr', 'eq'],axis=1, inplace=True)
tickets_comb.head(10)
| Description | Caller | Assignment group | |
|---|---|---|---|
| 0 | verified user details employee manager name ch... | spxjnwir pjlcoqds | GRP_0 |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | hmjdrvpb komuaywn | GRP_0 |
| 2 | cant log in to vpn received from eylqgodm ybqk... | eylqgodm ybqkwiam | GRP_0 |
| 3 | unable to access hr tool page | xbkucsvz gcpydteq | GRP_0 |
| 4 | skype error | owlgqjme qhcozdfx | GRP_0 |
| 5 | unable to log in to engineering tool and skype | eflahbxn ltdgrvkz | GRP_0 |
| 7 | ticket no employment status new non employee e... | eqzibjhw ymebpoih | GRP_0 |
| 8 | unable to disable add ins on outlook | mdbegvct dbvichlg | GRP_0 |
| 9 | ticket update on inplant | fumkcsji sarmtlhy | GRP_0 |
| 10 | engineering tool says not connected and unable... | badgknqs xwelumfz | GRP_0 |
tickets_comb['Caller'].value_counts()
bpctwhsn kzqsbmtp 784
ZkBogxib QsEJzdZO 142
fumkcsji sarmtlhy 133
rbozivdq gmlhrtvp 87
rkupnshb gsmzfojw 66
...
rfvmeyho qgtxjsdc 1
hiysxgwf vqdbtexf 1
sdlixwmb zvygmnco 1
uyjglskh lhurepnw 1
oybwdsgx oxyhwrfz 1
Name: Caller, Length: 2864, dtype: int64
tickets_comb[tickets_comb['Caller'] == "bpctwhsn kzqsbmtp"].value_counts()
Description Caller Assignment group
received from monitoring tool company com job job failed in job scheduler at bpctwhsn kzqsbmtp GRP_8 185
GRP_9 127
GRP_6 48
received from monitoring tool company com job hr payroll na u failed in job scheduler at bpctwhsn kzqsbmtp GRP_10 36
received from monitoring tool company com job pp eu tool netch ap failed in job scheduler at bpctwhsn kzqsbmtp GRP_8 21
...
received from monitoring tool company com job hr toolmforrun failed in job scheduler at bpctwhsn kzqsbmtp GRP_5 1
received from monitoring tool company com job job a failed in job scheduler at bpctwhsn kzqsbmtp GRP_8 1
received from monitoring tool company com job job ap failed in job scheduler at bpctwhsn kzqsbmtp GRP_45 1
received from monitoring tool company com job job b failed in job scheduler at bpctwhsn kzqsbmtp GRP_8 1
job job d running longer than minutes kirtyled and rerun received from monitoring tool company com job job d failed in job scheduler at bpctwhsn kzqsbmtp GRP_6 1
Length: 128, dtype: int64
# It seems this caller is receiving monitoring tool emails which is being logged as errors.
tickets_comb[tickets_comb['Caller'] == "ZkBogxib QsEJzdZO"].value_counts()
Description Caller Assignment group
received from monitoring tool company com abended job in job scheduler job at ZkBogxib QsEJzdZO GRP_9 30
GRP_8 25
GRP_6 14
received from monitoring tool company com abended job in job scheduler sid cold at ZkBogxib QsEJzdZO GRP_5 10
GRP_8 6
received from monitoring tool company com abended job in job scheduler snp heu regen at ZkBogxib QsEJzdZO GRP_6 4
received from monitoring tool company com abended job in job scheduler job b at ZkBogxib QsEJzdZO GRP_6 4
received from monitoring tool company com abended job in job scheduler hr tooldcvcgenratn at ZkBogxib QsEJzdZO GRP_6 4
received from monitoring tool company com abended job in job scheduler bk hana sid erp wly dp at ZkBogxib QsEJzdZO GRP_8 3
received from monitoring tool company com abended job in job scheduler sid hotf at ZkBogxib QsEJzdZO GRP_8 3
received from monitoring tool company com abended job in job scheduler job d at ZkBogxib QsEJzdZO GRP_6 3
received from monitoring tool company com abended job in job scheduler bk hana sid erp dly dp at ZkBogxib QsEJzdZO GRP_8 2
received from monitoring tool company com abended job in job scheduler bkbackup tool reporting tool prod inc at ZkBogxib QsEJzdZO GRP_8 2
received from monitoring tool company com abended job in job scheduler bkwin hostname inc at ZkBogxib QsEJzdZO GRP_8 2
received from monitoring tool company com abended job in job scheduler job at ZkBogxib QsEJzdZO GRP_10 2
GRP_5 2
received from monitoring tool company com abended job in job scheduler sid hot at ZkBogxib QsEJzdZO GRP_5 2
received from monitoring tool company com abended job in job scheduler pp eu tool netch ap at ZkBogxib QsEJzdZO GRP_8 2
received from monitoring tool company com abended job in job scheduler sid hoti at ZkBogxib QsEJzdZO GRP_8 2
received from monitoring tool company com abended job in job scheduler bkbackup tool hostname prod inc at ZkBogxib QsEJzdZO GRP_5 1
job d was running longer than minutes kirtyled and rerun to successful completion received from monitoring tool company com abended job in job scheduler job d at ZkBogxib QsEJzdZO GRP_6 1
job e was running longer than minutes kirtyled and rerun received from monitoring tool company com abended job in job scheduler job e at ZkBogxib QsEJzdZO GRP_6 1
received from monitoring tool company com abended job in job scheduler apo cif pds am at ZkBogxib QsEJzdZO GRP_6 1
received from monitoring tool company com abended job in job scheduler apo cif pds eu at ZkBogxib QsEJzdZO GRP_6 1
received from monitoring tool company com abended job in job scheduler archive idocs daily sid at ZkBogxib QsEJzdZO GRP_8 1
received from monitoring tool company com abended job in job scheduler sid stop at ZkBogxib QsEJzdZO GRP_14 1
received from monitoring tool company com abended job in job scheduler bk hana sid os dly dp at ZkBogxib QsEJzdZO GRP_8 1
received from monitoring tool company com abended job in job scheduler bkbackup tool hostname prod full at ZkBogxib QsEJzdZO GRP_8 1
received from monitoring tool company com abended job in job scheduler job was running longer min kirtyled and restarted ZkBogxib QsEJzdZO GRP_6 1
received from monitoring tool company com abended job in job scheduler mm zscr dly uschow at ZkBogxib QsEJzdZO GRP_29 1
received from monitoring tool company com abended job in job scheduler bkwin hostname inc at ZkBogxib QsEJzdZO GRP_5 1
received from monitoring tool company com abended job in job scheduler sid filesys at ZkBogxib QsEJzdZO GRP_8 1
received from monitoring tool company com abended job in job scheduler bkwin search server prod daily at ZkBogxib QsEJzdZO GRP_8 1
received from monitoring tool company com abended job in job scheduler bwhrattr at ZkBogxib QsEJzdZO GRP_9 1
received from monitoring tool company com abended job in job scheduler sid arc at ZkBogxib QsEJzdZO GRP_8 1
received from monitoring tool company com abended job in job scheduler pp eu tool netch keheu at ZkBogxib QsEJzdZO GRP_8 1
received from monitoring tool company com abended job in job scheduler job at ZkBogxib QsEJzdZO GRP_29 1
received from monitoring tool company com abended job in job scheduler sid stop hana slt at ZkBogxib QsEJzdZO GRP_14 1
job d was running longer than minutes kirtyled and rerun received from monitoring tool company com abended job in job scheduler job d at ZkBogxib QsEJzdZO GRP_6 1
dtype: int64
# same with other callers. Seems like an irrelevant column. let's drop it as well.
tickets_comb.drop(columns=['Caller'], axis=1, inplace=True)
tickets_comb.head(10)
| Description | Assignment group | |
|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 |
| 3 | unable to access hr tool page | GRP_0 |
| 4 | skype error | GRP_0 |
| 5 | unable to log in to engineering tool and skype | GRP_0 |
| 7 | ticket no employment status new non employee e... | GRP_0 |
| 8 | unable to disable add ins on outlook | GRP_0 |
| 9 | ticket update on inplant | GRP_0 |
| 10 | engineering tool says not connected and unable... | GRP_0 |
# let's keep this data sane and copy in different data frame to run algorithms
tickets_comb_modelling = tickets_comb.copy()
# Searched for identifying the lnguage of the text, as on scanning excel sheet I had come across some other language cahracters
# Reference: https://stackoverflow.com/questions/39142778/python-how-to-determine-the-language
# Using fasttext to detect language https://amitness.com/2019/07/identify-text-language-python/
import fasttext
# pretrained model from :: https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
PRETRAINED_MODEL_PATH = 'lid.176.bin'
lang_detect_model = fasttext.load_model(PRETRAINED_MODEL_PATH)
def predict_lang(sentence):
detected_lang = lang_detect_model.predict(sentence)[0]
detected_lang = detected_lang[0].replace('__label__', '')
return detected_lang;
tickets_comb_modelling['lang_predict'] = tickets_comb_modelling['Description'].apply(lambda v: predict_lang(v))
tickets_comb_modelling.head()
Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar.
| Description | Assignment group | lang_predict | |
|---|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 | en |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 | en |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 | en |
| 3 | unable to access hr tool page | GRP_0 | en |
| 4 | skype error | GRP_0 | ja |
tickets_comb_modelling[tickets_comb_modelling['lang_predict'] != 'en']
| Description | Assignment group | lang_predict | |
|---|---|---|---|
| 4 | skype error | GRP_0 | ja |
| 39 | call for ecwtrjnq jpecxuty | GRP_0 | eo |
| 51 | call for ecwtrjnq jpecxuty | GRP_0 | eo |
| 126 | blank call gso | GRP_0 | hr |
| 223 | probleme mit bluescreen hallo es ist erneut pa... | GRP_24 | de |
| ... | ... | ... | ... |
| 8439 | der drucker fã¼r die ups lapels druckt nicht r... | GRP_33 | de |
| 8457 | æ— æ³•ç™»é™†hr tool考勤系ç»ÿ 显示javaæ’ä... | GRP_30 | zh |
| 8465 | vpn 连枥ä¸ä¸š vpn连ä¸ä¸šï¼œè¯·è½¬ç»™ è´ºæ... | GRP_30 | cs |
| 8467 | from ntydihzo aeptfbgs sent friday august am t... | GRP_0 | de |
| 8471 | to å°è´ºï¼œæ—©ä¸šç”µè„‘开机开ä¸å‡ºæ¥ | GRP_30 | zh |
612 rows × 3 columns
# We have 612 non english rows. Some content does seem to be english in them though. But some is truly non-english characters.
tickets_comb_modelling
| Description | Assignment group | lang_predict | |
|---|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 | en |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 | en |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 | en |
| 3 | unable to access hr tool page | GRP_0 | en |
| 4 | skype error | GRP_0 | ja |
| ... | ... | ... | ... |
| 8493 | erp fi ob two accounts to be added i am sorry ... | GRP_10 | en |
| 8494 | tablet needs reimaged due to multiple issues w... | GRP_3 | en |
| 8495 | emails not coming in from zz mail received fro... | GRP_29 | en |
| 8496 | telephony software issue | GRP_0 | en |
| 8497 | vip windows password reset for tifpdchb pedxruyf | GRP_0 | en |
8042 rows × 3 columns
# Let's keep purely english
tickets_english = tickets_comb_modelling[tickets_comb_modelling['lang_predict'] == 'en']
tickets_english
| Description | Assignment group | lang_predict | |
|---|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 | en |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 | en |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 | en |
| 3 | unable to access hr tool page | GRP_0 | en |
| 5 | unable to log in to engineering tool and skype | GRP_0 | en |
| ... | ... | ... | ... |
| 8493 | erp fi ob two accounts to be added i am sorry ... | GRP_10 | en |
| 8494 | tablet needs reimaged due to multiple issues w... | GRP_3 | en |
| 8495 | emails not coming in from zz mail received fro... | GRP_29 | en |
| 8496 | telephony software issue | GRP_0 | en |
| 8497 | vip windows password reset for tifpdchb pedxruyf | GRP_0 | en |
7430 rows × 3 columns
tickets_english.drop(columns=['lang_predict'], axis=1, inplace=True)
/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py:4174: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy errors=errors,
tickets_english
| Description | Assignment group | |
|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 |
| 3 | unable to access hr tool page | GRP_0 |
| 5 | unable to log in to engineering tool and skype | GRP_0 |
| ... | ... | ... |
| 8493 | erp fi ob two accounts to be added i am sorry ... | GRP_10 |
| 8494 | tablet needs reimaged due to multiple issues w... | GRP_3 |
| 8495 | emails not coming in from zz mail received fro... | GRP_29 |
| 8496 | telephony software issue | GRP_0 |
| 8497 | vip windows password reset for tifpdchb pedxruyf | GRP_0 |
7430 rows × 2 columns
# Remove stop words, perform stemming and tokenize string
import nltk
nltk.download('stopwords')
from nltk.stem import SnowballStemmer
from nltk.tokenize import word_tokenize
nltk.download('punkt')
snow = SnowballStemmer('english')
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Package stopwords is already up-to-date! [nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Package punkt is already up-to-date!
tickets_english['tokens'] = tickets_english.apply(lambda row: nltk.word_tokenize(row['Description']), axis=1)
tickets_english['tokens'].apply(lambda x: [item for item in x if item not in stop_words])
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.
0 [verified, user, details, employee, manager, n...
1 [received, hmjdrvpb, komuaywn, gmail, com, hel...
2 [cant, log, vpn, received, eylqgodm, ybqkwiam,...
3 [unable, access, hr, tool, page]
5 [unable, log, engineering, tool, skype]
...
8493 [erp, fi, ob, two, accounts, added, sorry, ano...
8494 [tablet, needs, reimaged, due, multiple, issue...
8495 [emails, coming, zz, mail, received, avglmrts,...
8496 [telephony, software, issue]
8497 [vip, windows, password, reset, tifpdchb, pedx...
Name: tokens, Length: 7430, dtype: object
def get_stemmed_list(list):
output = []
for i in list:
output.append(snow.stem(i))
return output
tickets_english['stemmed'] = tickets_english['tokens'].apply(lambda v: get_stemmed_list(v))
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy import sys
tickets_english
| Description | Assignment group | tokens | stemmed | |
|---|---|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 | [verified, user, details, employee, manager, n... | [verifi, user, detail, employe, manag, name, c... |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 | [received, from, hmjdrvpb, komuaywn, gmail, co... | [receiv, from, hmjdrvpb, komuaywn, gmail, com,... |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 | [cant, log, in, to, vpn, received, from, eylqg... | [cant, log, in, to, vpn, receiv, from, eylqgod... |
| 3 | unable to access hr tool page | GRP_0 | [unable, to, access, hr, tool, page] | [unabl, to, access, hr, tool, page] |
| 5 | unable to log in to engineering tool and skype | GRP_0 | [unable, to, log, in, to, engineering, tool, a... | [unabl, to, log, in, to, engin, tool, and, skype] |
| ... | ... | ... | ... | ... |
| 8493 | erp fi ob two accounts to be added i am sorry ... | GRP_10 | [erp, fi, ob, two, accounts, to, be, added, i,... | [erp, fi, ob, two, account, to, be, ad, i, am,... |
| 8494 | tablet needs reimaged due to multiple issues w... | GRP_3 | [tablet, needs, reimaged, due, to, multiple, i... | [tablet, need, reimag, due, to, multipl, issu,... |
| 8495 | emails not coming in from zz mail received fro... | GRP_29 | [emails, not, coming, in, from, zz, mail, rece... | [email, not, come, in, from, zz, mail, receiv,... |
| 8496 | telephony software issue | GRP_0 | [telephony, software, issue] | [telephoni, softwar, issu] |
| 8497 | vip windows password reset for tifpdchb pedxruyf | GRP_0 | [vip, windows, password, reset, for, tifpdchb,... | [vip, window, password, reset, for, tifpdchb, ... |
7430 rows × 4 columns
# parts of Speech tagging
from nltk import pos_tag
nltk.download('averaged_perceptron_tagger')
tickets_english['tokens_pos'] = tickets_english['tokens'].apply(lambda v: pos_tag(v))
[nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /root/nltk_data... [nltk_data] Package averaged_perceptron_tagger is already up-to- [nltk_data] date!
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """
# Apply Wordnet lemmatization on parts of speech.
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
lemmatizer = WordNetLemmatizer()
from nltk.corpus import wordnet
def get_wordnet_pos(pos_tag):
if pos_tag.startswith('J'):
return wordnet.ADJ
elif pos_tag.startswith('V'):
return wordnet.VERB
elif pos_tag.startswith('N'):
return wordnet.NOUN
elif pos_tag.startswith('R'):
return wordnet.ADV
else:
return None
def get_lemmatized_list(list):
output = []
for token, tag in list:
wntag = get_wordnet_pos(tag)
if wntag is None:# not supply tag in case of None
lemma = lemmatizer.lemmatize(token)
else:
lemma = lemmatizer.lemmatize(token, pos=wntag)
output.append(lemma)
return output
tickets_english['wdnet_lemm'] = tickets_english['tokens_pos'].apply(lambda v: get_lemmatized_list(v))
[nltk_data] Downloading package wordnet to /root/nltk_data... [nltk_data] Unzipping corpora/wordnet.zip.
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:32: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tickets_english
| Description | Assignment group | tokens | stemmed | tokens_pos | wdnet_lemm | |
|---|---|---|---|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 | [verified, user, details, employee, manager, n... | [verifi, user, detail, employe, manag, name, c... | [(verified, VBN), (user, NN), (details, NNS), ... | [verify, user, detail, employee, manager, name... |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 | [received, from, hmjdrvpb, komuaywn, gmail, co... | [receiv, from, hmjdrvpb, komuaywn, gmail, com,... | [(received, VBN), (from, IN), (hmjdrvpb, JJ), ... | [receive, from, hmjdrvpb, komuaywn, gmail, com... |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 | [cant, log, in, to, vpn, received, from, eylqg... | [cant, log, in, to, vpn, receiv, from, eylqgod... | [(cant, JJ), (log, NN), (in, IN), (to, TO), (v... | [cant, log, in, to, vpn, receive, from, eylqgo... |
| 3 | unable to access hr tool page | GRP_0 | [unable, to, access, hr, tool, page] | [unabl, to, access, hr, tool, page] | [(unable, JJ), (to, TO), (access, NN), (hr, NN... | [unable, to, access, hr, tool, page] |
| 5 | unable to log in to engineering tool and skype | GRP_0 | [unable, to, log, in, to, engineering, tool, a... | [unabl, to, log, in, to, engin, tool, and, skype] | [(unable, JJ), (to, TO), (log, VB), (in, IN), ... | [unable, to, log, in, to, engineering, tool, a... |
| ... | ... | ... | ... | ... | ... | ... |
| 8493 | erp fi ob two accounts to be added i am sorry ... | GRP_10 | [erp, fi, ob, two, accounts, to, be, added, i,... | [erp, fi, ob, two, account, to, be, ad, i, am,... | [(erp, NN), (fi, NNS), (ob, VBP), (two, CD), (... | [erp, fi, ob, two, account, to, be, add, i, be... |
| 8494 | tablet needs reimaged due to multiple issues w... | GRP_3 | [tablet, needs, reimaged, due, to, multiple, i... | [tablet, need, reimag, due, to, multipl, issu,... | [(tablet, NN), (needs, NNS), (reimaged, VBD), ... | [tablet, need, reimaged, due, to, multiple, is... |
| 8495 | emails not coming in from zz mail received fro... | GRP_29 | [emails, not, coming, in, from, zz, mail, rece... | [email, not, come, in, from, zz, mail, receiv,... | [(emails, NNS), (not, RB), (coming, VBG), (in,... | [email, not, come, in, from, zz, mail, receive... |
| 8496 | telephony software issue | GRP_0 | [telephony, software, issue] | [telephoni, softwar, issu] | [(telephony, NN), (software, NN), (issue, NN)] | [telephony, software, issue] |
| 8497 | vip windows password reset for tifpdchb pedxruyf | GRP_0 | [vip, windows, password, reset, for, tifpdchb,... | [vip, window, password, reset, for, tifpdchb, ... | [(vip, NN), (windows, NNS), (password, VBP), (... | [vip, window, password, reset, for, tifpdchb, ... |
7430 rows × 6 columns
def get_bag_of_words(listOfListOfWords):
output = "";
for listOfWords in listOfListOfWords:
for word in listOfWords:
output = output + " " + word
return output
# Draw wordcloud for grp_0 which is largest group
#word cloud for top 3 groups data
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
#word cloud for overall data
stopwords = STOPWORDS
tickets_grp0 = tickets_english[tickets_english['Assignment group'] == 'GRP_0']
wordcloud_grp0 = WordCloud(stopwords = stop_words, max_words=500, background_color="white",width=800, height=400).generate(get_bag_of_words(tickets_grp0['wdnet_lemm'].values))
plt.figure( figsize=(20,10) )
plt.imshow(wordcloud_grp0,interpolation='bilinear')
plt.axis("off")
plt.show()
# From word cloud seems like most prominent issue for Group 0 is resolving email access issues through password reset, etc.
# Show word cloud for all selected groups
tickets_english['Assignment group'].unique()
array(['GRP_0', 'GRP_3', 'GRP_4', 'GRP_5', 'GRP_6', 'GRP_7', 'GRP_8',
'GRP_9', 'GRP_10', 'GRP_12', 'GRP_13', 'GRP_14', 'GRP_15',
'GRP_16', 'GRP_17', 'GRP_18', 'GRP_19', 'GRP_2', 'GRP_20',
'GRP_24', 'GRP_25', 'GRP_26', 'GRP_28', 'GRP_29', 'GRP_30',
'GRP_31', 'GRP_34', 'GRP_33', 'GRP_40', 'GRP_41', 'GRP_45',
'GRP_42'], dtype=object)
for grp in tickets_english['Assignment group'].unique():
tickets_grp = tickets_english[tickets_english['Assignment group'] == grp]
wordcloud_grp = WordCloud(stopwords = stop_words, max_words=500, background_color="white",width=800, height=400).generate(get_bag_of_words(tickets_grp['wdnet_lemm'].values))
print("Group Name:" + grp)
plt.figure( figsize=(20,10) )
plt.imshow(wordcloud_grp,interpolation='bilinear')
plt.axis("off")
plt.show()
Group Name:GRP_0
Group Name:GRP_3
Group Name:GRP_4
Group Name:GRP_5
Group Name:GRP_6
Group Name:GRP_7
Group Name:GRP_8
Group Name:GRP_9
Group Name:GRP_10
Group Name:GRP_12
Group Name:GRP_13
Group Name:GRP_14
Group Name:GRP_15
Group Name:GRP_16
Group Name:GRP_17
Group Name:GRP_18
Group Name:GRP_19
Group Name:GRP_2
Group Name:GRP_20
Group Name:GRP_24
Group Name:GRP_25
Group Name:GRP_26
Group Name:GRP_28
Group Name:GRP_29
Group Name:GRP_30
Group Name:GRP_31
Group Name:GRP_34
Group Name:GRP_33
Group Name:GRP_40
Group Name:GRP_41
Group Name:GRP_45
Group Name:GRP_42
# Recreate sentences from token from wordnet
def tok_2_sentence(tokens):
sentence = ""
for token in tokens:
sentence = sentence + " "+ token;
return sentence
tickets_english['wdnet_Description'] = tickets_english['wdnet_lemm'].apply(lambda v: tok_2_sentence(v))
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tickets_english
| Description | Assignment group | tokens | stemmed | tokens_pos | wdnet_lemm | wdnet_Description | |
|---|---|---|---|---|---|---|---|
| 0 | verified user details employee manager name ch... | GRP_0 | [verified, user, details, employee, manager, n... | [verifi, user, detail, employe, manag, name, c... | [(verified, VBN), (user, NN), (details, NNS), ... | [verify, user, detail, employee, manager, name... | verify user detail employee manager name chec... |
| 1 | received from hmjdrvpb komuaywn gmail com hell... | GRP_0 | [received, from, hmjdrvpb, komuaywn, gmail, co... | [receiv, from, hmjdrvpb, komuaywn, gmail, com,... | [(received, VBN), (from, IN), (hmjdrvpb, JJ), ... | [receive, from, hmjdrvpb, komuaywn, gmail, com... | receive from hmjdrvpb komuaywn gmail com hell... |
| 2 | cant log in to vpn received from eylqgodm ybqk... | GRP_0 | [cant, log, in, to, vpn, received, from, eylqg... | [cant, log, in, to, vpn, receiv, from, eylqgod... | [(cant, JJ), (log, NN), (in, IN), (to, TO), (v... | [cant, log, in, to, vpn, receive, from, eylqgo... | cant log in to vpn receive from eylqgodm ybqk... |
| 3 | unable to access hr tool page | GRP_0 | [unable, to, access, hr, tool, page] | [unabl, to, access, hr, tool, page] | [(unable, JJ), (to, TO), (access, NN), (hr, NN... | [unable, to, access, hr, tool, page] | unable to access hr tool page |
| 5 | unable to log in to engineering tool and skype | GRP_0 | [unable, to, log, in, to, engineering, tool, a... | [unabl, to, log, in, to, engin, tool, and, skype] | [(unable, JJ), (to, TO), (log, VB), (in, IN), ... | [unable, to, log, in, to, engineering, tool, a... | unable to log in to engineering tool and skype |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 8493 | erp fi ob two accounts to be added i am sorry ... | GRP_10 | [erp, fi, ob, two, accounts, to, be, added, i,... | [erp, fi, ob, two, account, to, be, ad, i, am,... | [(erp, NN), (fi, NNS), (ob, VBP), (two, CD), (... | [erp, fi, ob, two, account, to, be, add, i, be... | erp fi ob two account to be add i be sorry i ... |
| 8494 | tablet needs reimaged due to multiple issues w... | GRP_3 | [tablet, needs, reimaged, due, to, multiple, i... | [tablet, need, reimag, due, to, multipl, issu,... | [(tablet, NN), (needs, NNS), (reimaged, VBD), ... | [tablet, need, reimaged, due, to, multiple, is... | tablet need reimaged due to multiple issue wi... |
| 8495 | emails not coming in from zz mail received fro... | GRP_29 | [emails, not, coming, in, from, zz, mail, rece... | [email, not, come, in, from, zz, mail, receiv,... | [(emails, NNS), (not, RB), (coming, VBG), (in,... | [email, not, come, in, from, zz, mail, receive... | email not come in from zz mail receive from a... |
| 8496 | telephony software issue | GRP_0 | [telephony, software, issue] | [telephoni, softwar, issu] | [(telephony, NN), (software, NN), (issue, NN)] | [telephony, software, issue] | telephony software issue |
| 8497 | vip windows password reset for tifpdchb pedxruyf | GRP_0 | [vip, windows, password, reset, for, tifpdchb,... | [vip, window, password, reset, for, tifpdchb, ... | [(vip, NN), (windows, NNS), (password, VBP), (... | [vip, window, password, reset, for, tifpdchb, ... | vip window password reset for tifpdchb pedxruyf |
7430 rows × 7 columns
# TFID vectorizer as recommended during mentoring session
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
tickets_tfidf = vectorizer.fit_transform(tickets_english['wdnet_Description'])
print(tickets_tfidf.shape)
(7430, 12042)
vectorizer.get_feature_names()
['aa', 'aaa', 'aab', 'aacbcc', 'aacbccbefb', 'aacfa', 'aacount', 'aad', 'aadb', 'aae', 'aaefe', 'aao', 'aaplant', 'ab', 'abandon', 'abap', 'abb', 'abba', 'abbc', 'abc', 'abca', 'abcdegy', 'abcdri', 'abd', 'abdhtyu', 'abe', 'abend', 'abended', 'abeoucfj', 'abf', 'abff', 'abgebildet', 'abgrtyreu', 'abhay', 'abholen', 'ability', 'abl', 'able', 'abode', 'abort', 'aborted', 'about', 'above', 'abovementioned', 'abreu', 'abrurto', 'absence', 'absent', 'absolutely', 'abwfnzwbnvbw', 'ac', 'acache', 'acb', 'acbccb', 'acc', 'acccount', 'accdb', 'accees', 'accept', 'acceptance', 'acceptingâ', 'acces', 'accesible', 'acceso', 'access', 'accessibility', 'accessible', 'accident', 'accidental', 'accidentally', 'accidently', 'accompanying', 'accomplish', 'accont', 'accord', 'accordance', 'accordingly', 'accound', 'account', 'accountant', 'accounting', 'accout', 'accross', 'accrual', 'accsess', 'acct', 'accts', 'accuracy', 'accurate', 'acd', 'ace', 'acess', 'acf', 'acgyuna', 'achghar', 'achghyardr', 'achthyardk', 'ack', 'acknowledgement', 'acl', 'acmglkti', 'aconnection', 'acqpinyd', 'acquire', 'acrobat', 'across', 'act', 'action', 'activate', 'activated', 'activation', 'active', 'actively', 'activesync', 'activex', 'activity', 'actual', 'actuall', 'actually', 'acuvyqnx', 'acwuvpuzdunkhjwjqwvmyqktqtf', 'acxedqjm', 'aczyfqjr', 'ad', 'ada', 'adadd', 'adaef', 'adapoter', 'adapter', 'adaptor', 'adbfe', 'adc', 'add', 'adddf', 'added', 'addiitional', 'addin', 'addins', 'addition', 'additional', 'additionally', 'addon', 'addr', 'address', 'addressâ', 'adeca', 'adfa', 'aditya', 'adiuklhl', 'adjtmlzn', 'adjust', 'adjustment', 'admin', 'adminhtml', 'administrador', 'administration', 'administrative', 'administrator', 'admins', 'adobe', 'adopter', 'adowy', 'adpvilqu', 'adress', 'adrhtykins', 'adt', 'advance', 'advanced', 'advantage', 'advice', 'advise', 'advisor', 'adwares', 'adwind', 'adwjfpbreu', 'ae', 'aeae', 'aeb', 'aecd', 'aed', 'aedwrpvo', 'aedzqlvj', 'aee', 'aeea', 'aef', 'aeftjxos', 'aegpkruc', 'aeithcvp', 'aenl', 'aeophctw', 'aero', 'aerospace', 'aerp', 'aes', 'aese', 'aetwpiox', 'aevzsogn', 'af', 'afab', 'afc', 'afcbrhqw', 'afd', 'afdceb', 'afdfe', 'afe', 'afef', 'afefsano', 'aff', 'affect', 'affected', 'affiliate', 'afgdmesz', 'afghtyjith', 'aficio', 'afkstcev', 'afplnyxb', 'afqvyuwh', 'africa', 'afsid', 'after', 'afterit', 'afternoon', 'afukzhnm', 'afwzehqs', 'ag', 'again', 'agains', 'against', 'agbighyail', 'agent', 'agentid', 'agfmpyhr', 'agfxelwz', 'aggergrythator', 'aghl', 'aghw', 'aghynil', 'aghynilthykurtyar', 'agian', 'agnwfwieszka', 'ago', 'agr', 'agree', 'agreement', 'agrtywal', 'agthynew', 'agvl', 'agvw', 'ahbgjrqz', 'ahdwqrson', 'ahead', 'ahgjsvoq', 'ahjklpxm', 'ahlqgjwx', 'ahmbnsoi', 'ahost', 'ahrskvln', 'ahydmrbu', 'ahyeqpmx', 'ahyiuqev', 'ahypftjx', 'aidl', 'aidle', 'aidw', 'aiiw', 'aimcfeko', 'aiml', 'ain', 'ainl', 'ainuhbmk', 'ainw', 'aiobpkzm', 'aiqjxhuv', 'air', 'aircap', 'airwaybill', 'aisl', 'aitsgqwo', 'aiuknwzj', 'aiul', 'aiuw', 'ajlbguzn', 'ajnpuqym', 'ajomhkfv', 'ajuiegrson', 'ajuyanni', 'ak', 'aka', 'akbvznci', 'akiowsmp', 'akirtyethsyd', 'akisjtzm', 'aksthyuhath', 'aktion', 'aktplhre', 'akã', 'al', 'alabama', 'alarm', 'albaney', 'albussdqp', 'ald', 'ale', 'alejayhsdtffndro', 'alert', 'alex', 'alexandfrre', 'alexandre', 'alexansxcddre', 'alexgnhtjunder', 'algorithm', 'alicona', 'aliuytre', 'aliv', 'alive', 'aljbtwsh', 'alkuozfr', 'all', 'alle', 'allert', 'allinvest', 'allocate', 'allocation', 'allow', 'allowe', 'alloy', 'alluser', 'allways', 'alm', 'almeida', 'almost', 'almrgtyeiba', 'alone', 'along', 'alook', 'alparslanthyr', 'alphabet', 'alphastdgtyal', 'alr', 'already', 'alrthyu', 'also', 'alt', 'alte', 'alternate', 'although', 'alto', 'altogether', 'alvesdss', 'always', 'alwaysupservice', 'alwjivqg', 'amadeu', 'amar', 'amb', 'ambals', 'ambiance', 'ambient', 'amend', 'amerirtca', 'amerirtcas', 'amet', 'amfgtyartya', 'amhywoqg', 'amiebrlf', 'amihtar', 'amniujsh', 'amongst', 'amount', 'amrice', 'amrthruta', 'amssm', 'amunt', 'amy', 'an', 'ana', 'analog', 'analtyicspro', 'analyse', 'analyser', 'analysis', 'analyst', 'analytics', 'analyze', 'analyzer', 'anantadth', 'anbtr', 'ancile', 'and', 'anderen', 'andhtyju', 'andrdgrtew', 'android', 'andthyerh', 'anecdfps', 'anfghyudrejy', 'anftgup', 'angelique', 'angry', 'angyta', 'anira', 'anivdcor', 'anmeldungen', 'annette', 'anniversary', 'annotatorlist', 'announce', 'annoy', 'annyhtie', 'anonymizing', 'anonymous', 'another', 'anpocezt', 'anrgtdy', 'ansi', 'answer', 'answkqpe', 'anteagroup', 'anti', 'anticipation', 'antigvjx', 'antispam', 'antivirus', 'antjuyhony', 'anubis', 'anubisnetworks', 'anup', 'anuxbyzg', 'anvqzdif', 'anwmfvlgenkataramdntyana', 'anxmvsor', 'any', 'anybody', 'anyhusppa', 'anylonger', 'anymore', 'anyone', 'anyother', 'anything', 'anytime', 'anyway', 'anyways', 'anywhere', 'ao', 'aobrelcs', 'aoehpltm', 'aofnvyzt', 'aolhgbps', 'aolijwnx', 'aorthyme', 'aoshpjiu', 'aosqelnr', 'aouezihl', 'aoxtugzr', 'aoyrspjv', 'ap', 'apac', 'apacc', 'apacjun', 'apacnet', 'apacpuchn', 'aparecido', 'apart', 'apc', 'api', 'apis', 'apkqmrdu', 'apktrsyq', 'aplications', 'apo', 'apokrfjv', 'apologise', 'apologize', 'apost', 'app', 'appair', 'apparently', 'appear', 'appends', 'apple', 'applewebkit', 'appliance', 'applicable', 'applicaiton', 'application', 'apply', 'appointment', 'appoval', 'appreciate', 'appreciated', 'appreciatehub', 'apprentice', 'approfghaching', 'appropriate', 'approva', 'approval', 'approve', 'approved', 'approver', 'approx', 'approximate', 'approximately', 'apps', 'april', 'aprtgghjk', 'apt', 'apul', 'apusm', 'apvpn', 'apxmsjkc', 'ap<æ', 'aqihfoly', 'aqjdvexo', 'aqrhwjgo', 'aqritplu', 'aqrzskpg', 'aqstdryv', 'aqvocmuy', 'aqzcisjy', 'aqzz', 'ar', 'aracä', 'araghtyu', 'arbeitsstationsvertrauensstellung', 'arbitrary', 'arc', 'arcade', 'arcgonvy', 'architecture', 'archive', 'archived', 'archiving', 'arded', 'area', 'aren', 'arexjftu', 'argentina', 'argtxmvcumar', 'arise', 'arithel', 'arjpdohf', 'arkulcoi', 'aroqwuvz', 'around', 'arpa', 'arrange', 'arrangement', 'arrive', 'arrojhsjd', 'arrow', 'arsbtkvd', 'art', 'article', 'aryndruh', 'as', 'asa', 'asano', 'ascend', 'ascii', 'ascpqvni', 'asfgthok', 'ashdtyf', 'asheshopsw', 'ashley', 'ashtusis', 'asia', 'asiapac', 'asid', 'asignment', 'asistance', 'asjadjs', 'asjdidwni', 'ask', 'asks', 'aspap', 'aspect', 'aspx', 'assembly', 'assemblyresourcelists', 'assessment', 'asset', 'assign', 'assigned', 'assignment', 'assing', 'assist', 'assistance', 'assistant', 'associate', 'association', 'asst', 'assume', 'assumption', 'assunto', 'assurance', 'assyli', 'assylias', 'astmvqhc', 'asuenpyg', 'aswl', 'aswubnyd', 'aswyuysm', 'at', 'atache', 'atached', 'atcl', 'atclx', 'atdclmyi', 'athjyul', 'athrdyau', 'atjsv', 'atlanta', 'atleast', 'atm', 'atp', 'att', 'attach', 'attached', 'attachement', 'attachements', 'attachment', 'attachments', 'attachã', 'attack', 'attacker', 'attempt', 'attempted', 'attend', 'attendance', 'attendee', 'attens', 'attention', 'attn', 'attrachment', 'attribudes', 'attribute', 'attributecode', 'attributetext', 'attributetype', 'atttached', 'atuldhy', 'atydjkwl', 'au', 'audi', 'audible', 'audio', 'audit', 'auditor', 'aueftuyaienptkn', 'auf', 'aufmvcapktttrux', 'auftragsausgang', 'auftragspapiere', 'aug', 'augdec', 'august', 'aunpdmlj', 'aupdonjy', 'aupnvems', 'aurangabad', 'aus', 'ausdruck', 'ausgabe', 'ausgefã¼hrt', 'ausgeschaltet', 'australia', 'auswerten', 'autamatically', 'authenication', 'authentic', 'authenticate', 'authentication', 'authorisation', 'authoritative', 'authority', 'authorization', 'authorization(s', 'authorize', 'authorized', 'auto', 'autobank', 'autoforward', 'automatci', 'automate', 'automated', 'automatic', 'automatical', 'automatically', 'automaticaly', 'automation', 'automatisch', 'autoresolve', 'autorice', 'auvolfhp', 'auzroqes', 'av', 'availability', 'available', 'avast', 'ave', 'average', 'avez', 'avglmrts', 'avigtshay', 'avmeocnk', 'avoid', 'avsbdhyu', 'avurmegj', 'avwqmhsp', 'aw', 'awa', 'await', 'award', 'aware', 'awareness', 'away', 'awb', 'awddmwdol', 'awhile', 'awkrdqzb', 'awswering', 'awyl', 'awylw', 'awyrthysm', 'awysinic', 'awysv', 'awyw', 'awywkjswx', 'awywkwdx', 'awywx', 'axcbfuqo', 'axcl', 'axcrspyh', 'axhg', 'axhkewnv', 'axpqctfr', 'aykegsvr', 'aylrbosw', 'aypgzieh', 'ayrhcfxi', 'aytjedki', 'ayuda', 'ayueswcm', 'ayujdm', 'az', 'azbtkqwx', 'azdxonjg', 'azerbaijan', 'azgtrbow', 'azjfshry', 'azovgeck', 'azoyklqe', 'aztlkeif', 'aztlkeifowndararajan', 'azubi', 'azubis', 'azure', 'azurewebsites', 'azvixyqg', 'azvoespk', 'azxhejvq', 'azyfsrqh', 'ba', 'baa', 'baac', 'baafd', 'baafdcebef', 'babanlal', 'babhjbu', 'babiluntr', 'bac', 'bachsdadgtadw', 'bachsmhdyhti', 'back', 'backdate', 'backend', 'backflush', 'background', 'backorder', 'backorderreports', 'backup', 'backups', 'bactelephony', 'bad', 'badfe', 'badge', 'badgknqs', 'baf', 'bag', 'bagtylleg', 'bahbrgy', 'bahdqrcs', 'bajrpckl', 'baker', 'bakertm', 'bakheyr', 'bakyhrer', 'balance', 'balancing', 'bals', 'balzers', 'band', 'bandwidth', 'bank', 'banking', 'bankrd', 'bankruped', 'bankverbindung', 'baoapacg', 'bapi', 'bar', 'baranwfhrty', 'barcelona', 'barcode', 'barcodes', 'bare', 'barley', 'barrtyh', 'bas', 'base', 'basic', 'basically', 'basis', 'batch', 'bath', 'bathylardb', 'batia', 'battel', 'batter', 'battery', 'battle', 'batuhan', 'bau', 'baugtymli', 'baurhty', 'bauuyternfeyt', 'bay', 'bb', 'bba', 'bbb', 'bbcc', 'bbd', 'bbe', 'bbed', 'bbf', 'bbfa', 'bbl', 'bbo', 'bc', 'bca', 'bcb', 'bcc', 'bcd', 'bcda', 'bcee', 'bcefayom', 'bcf', 'bcfac', 'bcl', 'bcom', 'bctypmjw', 'bcxfhekz', 'bcxpeuko', 'bd', 'bda', 'bdacd', 'bdbe', 'bdc', 'bdcafefef', 'bdclient', 'bddbf', 'bddf', 'bddjwwwdw', 'bdeb', 'bdegqtyj', 'bdf', 'bdfzamjs', 'bdjiosrp', 'bdm', 'bdvcealj', 'bdwdwarbara', 'be', 'beach', 'beachten', 'beacon', 'beahleb', 'beamer', 'beathe', 'beb', 'because', 'beckes', 'become', 'becoxvqkahadikar', 'bed', 'bedord', 'bee', 'beenefits', 'beep', 'bef', 'befb', 'befdba', 'beff', 'before', 'begin', 'beginning', 'behalf', 'behave', 'behavior', 'beheben', 'behind', 'behsnjty', 'bei', 'beilage', 'beilageproben', 'belgium', 'belhadjhamida', 'believe', 'bellusco', 'belo', 'belong', 'below', 'belt', 'belwo', 'ben', 'benamor', 'bench', 'beneath', 'beneficial', 'benefit', 'benelthyux', 'benethrytte', 'bengtjamin', 'benign', 'benjamtrhdyin', 'benoittry', 'benz', 'benã', 'beosjgxt', 'bereits', 'berfkting', 'bernardo', 'bertes', 'bertsckaadyd', 'beschichten', 'beschichtungsleitstand', 'beschreibung', 'beshryu', 'beshryued', 'beshryulisted', 'beshryulists', 'beshryuout', 'beshryuwire', 'beside', 'best', 'bestand', 'bestellnumer', 'bestellungen', 'betreff', 'betshdy', 'bettery', 'bettymcdanghtnuell', 'between', 'betwenn', 'beuflorc', 'bev', 'bex', 'beyhtcykea', 'beyklcmj', 'beyond', 'bf', 'bfb', 'bfbfc', 'bfca', 'bfckamsg', 'bfda', 'bfeecda', 'bfeecdaaccaaca', 'bff', 'bfghabu', 'bfhjtuiwell', 'bfiwanze', 'bfnackrw', 'bfnvjgxd', 'bfrgtonersp', 'bfrx', 'bgdxitwu', 'bgflmyar', 'bgfmrltw', 'bghrbie', 'bgpedtqc', 'bgqpotek', 'bgtyrant', 'bgwneavl', 'bgyluoqn', 'bh', 'bhatyr', 'bhayhtrathramdnty', 'bhergtyemm', 'bhghtyum', 'bhjqvtzm', ...]